Abstract

Analysis of sequences obtained from Saanich Inlet using mothur and QIIME2 revealed peak community alpha-diversity at a depth of approximately 100m which contains approximately 38uM of dissolved oxygen. Lowest diversity was observed at greater depth and with lower oxygen levels. Further analysis of taxonomic levels revealed Proteobacteria as the most abundant phylum within all samples. In order to investigate how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, we focused on the phylum Chloroflexi. Analysis revealed a positive correlation between depth and Chloroflexi abundance. This correlation was significant only when analysis was based on QIIME2-generated ASVs, and not mothur-generated OTUs. In addition, there exists a negative correlation between oxygen concentration and Chloroflexi abundance. Similarly, significance was only reported using QIIME2-generated ASVs. The analysis of data using QIIME2 identified four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66, while mothur identified two classes within this phylum: Anaerolineae, and SAR202. Lastly, changes in the abundance of OTUs and ASVs were correlated with depth and oxygen concentration. No correlation was found to be significant. The abundances of 24 (of 34) OTUs and 38 (of 47) ASVs were positively correlated with depth, while all OTUs and 46 ASVs were negatively correlated with oxygen concentration. However, the presence of outliers in data analyzed by both mothur and QIIME2 may have biased the trend line and reduced model significances. Although, the absence of sufficient data limits us from classifying them as outliers. The differences observed between mothur and QIIME2 indicate the role played by the choice of pipeline to analyze results.

Introduction

Saanich Inlet is a seasonally anoxic fjord [1] located between Vancouver Island and the Saanich Peninsula. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen is replenished dependent on the season, mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabit the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at higher depth up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is vital to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the world’s oceans [3].

Operational Taxonomic Units (OTUs) are defined as clusters of organisms that have been grouped based on DNA sequence similarity of a specific DNA segment known as a taxonomic marker gene [4]. The grouped DNA sequences differ by less than a fixed and arbitrary sequence dissimilarity threshold, often 3% [5]. This process of clustering on a specific DNA segment, known as DNA barcoding, allows for rapid, targeted, and high throughput analysis of genetic variation in a specific genomic region such as 16s/18s rRNA sequences, leading to large scale characterization of microbial communities [4, 6]. However, new recent amplicon sequence variants (ASVs) methods have been developed with finer resolution and are independent of dissimilarity thresholds that have been used to define OTUs. ASV methods have shown higher specificity and sensitivity in comparison to OTU methods as they distinguish sequence variants as small as single nucleotides and denoise the sequences by discriminating biological sequences from errors. This is done based on the expectation that biological sequences are more abundant and more repeatedly observed than error-containing sequences [5].

Using OTU and ASV data for samples collected from the Saanich Inlet, we investigated how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, with a particular focus on the phylum Chloroflexi. We found Chloroflexi of interest because its members are highly abundant in marine sediments [7] and present a broad spectrum of metabolic characteristics such as anoxygenic photosynthesis [8], obligate aerobic and anaerobic heterotrophy [9], and even predation with a gliding motility [10]. Like many other microbes, members of Chloroflexi can be a challenge to grow in culture, with some classes yet to be cultured successfully, which has made characterizing their metabolisms a challenge [11, 19]. However, new sequencing technologies have made it possible to analyze the genome of and characterize these uncultured microbes [11-19].

In our Saanich Inlet data, we were able to identify four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66. Members of the class Dehalococcoidia are widely distributed throughout marine sediments [11] and anoxic deep waters [12]. Dehalococcoidia grow via anaerobic organohalide respiration and are extensively studied for their potential in the bioremediation of chloride-contaminated water and soil [11, 12]. As for the class Anaerolineae, despite its members being prevalent in various ecosystems, only a few strains have been successfully cultured [13]. Anaerolineae compose one of the core populations of anaerobic bacteria involved in anaerobic digestion and possess key genes for catalyzing cellulose hydrolysis [14].The SAR202 cluster was one of the earliest discoveries of marine bacteria which inhabited the aphotic zone [15], and since then SAR202 has been found to be ubiquitous throughout the deep ocean [16]. Members of SAR202 are involved in metabolizing organosulfur compounds and likely play a major role in sulfur cycling [17]. JG30-KF-CM66 is a relatively uncharacterized clade of acidobacteria, but it has been identified in soft coal slags [18] and anoxic ocean water [19]. The characteristics of each of these classes impact how the spatial distribution of the Chloroflexi phylum differs within Saanich Inlet, and we also set out to determine if and how different sequence analysis pipelines would impact these biological conclusions.

Methods

Sample Collection and Processing

Water samples from 16 depths (10-200m) from cruise 72 were collected at station S3 (48°35.500 N, 123°30.300 W) onboard MSV John Strickland. Geochemical and multi-omic information, which included 16S rRNA gene amplicon sequences (V4-5 hypervariable regions) and dissolved O2, were extracted for each depth [20, 21]. Data from 7 depths (10, 100, 120, 135, 150, 165, 200m) were further analyzed. Dissolved O2 was measured onboard by the Sea-Bird SBE 43 Photosynthetically Active Radiation sensor [20]. 2L of water sample at each depth was filtered onto 0.22μm Strerivix filters, and stored until amplicon sequencing, which was carried out on the Illumina MiSeq platform at the Joint Genome Institute. Base qualities were encoded in Phred33, and primers 515F and 806R were used for 16S rRNA gene amplification [21].

Data Processing

Reads were independently processed using mothur and QIIME2-based pipelines, which cluster sequences based on OTUs and ASVs respectively. Resultant data was constructed as phyloseq objects for downstream analysis in R.

Mothur pipeline

Sequenced reads were first assembled into contigs, which were screened and de-duplicated so that the remainder 1) were between 20bp and 600bp long, 2) had fewer than 8 homopolymers and 3) had no ambiguous bases.

make.file(inputdir=[filePath]/Saanich, prefix=Saanich)

make.contigs(file=Saanich.files, processors=10)

summary.seqs(fasta=Saanich.trim.contigs.fasta)

screen.seqs(fasta=Saanich.trim.contigs.fasta, group=Saanich.contigs.groups, maxambig=0, maxhomop=8, minlength=200, maxlength=600)

unique.seqs(fasta=Saanich.trim.contigs.good.fasta)

summary.seqs(fasta=Saanich.trim.contigs.good.unique.fasta, count=Saanich.trim.contigs.good.count_table)

Configs were trimmed so that they only align to bases 10368 to 25434 in the SILVA databases (release 128), and uninformative bases were removed. Resultant sequences were de-duplicated again.

align.seqs(fasta=Saanich.trim.contigs.good.unique.fasta, reference=silva.nr_v128.align, flip=T, processors=10)

summary.seqs(fasta=Saanich.trim.contigs.good.unique.align, count=Saanich.trim.contigs.good.count_table)

screen.seqs(fasta=Saanich.trim.contigs.good.unique.align, count=Saanich.trim.contigs.good.count_table, summary=Saanich.trim.contigs.good.unique.summary, start=10368, end=25434, processors=10)

filter.seqs(fasta=Saanich.trim.contigs.good.unique.good.align, vertical=T, trump=.)

unique.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.fasta, count=Saanich.trim.contigs.good.good.count_table)

Sequences were then pre-clustered and clustered de novo (using 97% sequence similarity) to determine the final OTUs. Chimeric sequences were also filtered away.

pre.cluster(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.fasta, count=Saanich.trim.contigs.good.unique.good.filter.count_table, diffs=3)

summary.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table)

chimera.uchime(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)

remove.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)

dist.seqs(fasta=Saanich.final.fasta, processors=15)

cluster.split(column=Saanich.final.dist, count=Saanich.final.count_table, method=opti, processors=10, large=T)

make.shared(list=Saanich.final.opti_mcc.unique_list.list, count=Saanich.final.count_table, label=0.03)

Clusters were classified using the SILVA database, and resulting taxonomies were condensed.

classify.seqs(fasta=Saanich.final.fasta, count=Saanich.final.count_table, template=silva.nr_v128.align, taxonomy=silva.nr_v128.tax, cutoff=80, processors=10)

classify.otu(list=Saanich.final.opti_mcc.unique_list.list, taxonomy=Saanich.final.nr_v128.wang.taxonomy, count=Saanich.final.count_table, label=0.03, cutoff=80, basis=otu, probs=F)

QIIME2 pipeline

Reads were demultiplexed and imported into QIIME2. Per-base read qualities were visualized to determine downstream trim parameters.

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path file/path/here/pe-33-manifest.csv \
  --output-path paired-end-demux.qza \
  --source-format PairedEndFastqManifestPhred33

qiime demux summarize \
  --i-data paired-end-demux.qza \
  --o-visualization demux.qzv

ASVs were generated using the Dada2 protocol using custom trim parameters for quality control. Resultant ASVs were classified using the SILVA database (release 119) using a 99% similarity threshold.

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs paired-end-demux.qza \
  --o-table table \
  --o-representative-sequences rep-seqs \
  --p-trim-left-f 5 \
  --p-trim-left-r 12 \
  --p-trunc-len-f 240 \
  --p-trunc-len-r 184

qiime feature-classifier classify-sklearn \
  --i-classifier silva-119-99-515-806-nb-classifier.qza \
  --i-reads rep-seqs.qza \
  --o-classification taxonomy.qza

The ASV table was then converted to text format used to create a phyloseq object.

qiime tools export table.qza \
  --output-dir exported-feature-table

biom convert -i exported-feature-table/feature-table.biom -o feature-table.tsv \
  --to-tsv

Data Analysis

Analysis was completed in R v3.4.3 [5] using the following packages.

library(tidyverse)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(stringr)
library(magrittr)
library(knitr)
library(gridExtra)
library(grid)
library(randomcoloR)

Alpha-diversities of clusters identified by mothur and QIIME2 from each sample were measured by the Shannon diversity index and the Chao1 richness estimator. Alpha-diversities were plotted against sample depth and oxygen concentration for both clustering methods, and were fitted using local polynomial regression models where appropriate. Relative abundances of all phylum level classifications produced by mothur and QIIME2 were also plotted for each sample.

Relative abundances of Chloroflexi OTUs and ASVs amongst all clusters were plotted, both as a whole and individually, across depth and oxygen gradients. Significances of correlations between these variables were based on linear regression models, as all variables are continuous and there is a lack of evidence to suggest curvilinear relationships between them.

Data cleaning

# generate random colours for use in figures.
palette <- distinctColorPalette(40) 

Data were loaded into R and samples normalized to 100,000 sequences per sample.

load("mothur_phyloseq.RData")
load("qiime2_phyloseq.RData")

# random seed set for reproducibility
set.seed(4831) 
m.norm = rarefy_even_depth(mothur, sample.size=100000)
q.norm = rarefy_even_depth(qiime2, sample.size=100000)

Relative abundance percentages were calculated for the data.

m.percent = transform_sample_counts(m.norm, function(x) 100 * x/sum(x))
q.percent = transform_sample_counts(q.norm, function(x) 100 * x/sum(x))

The phylum Chloroflexi was chosen.

phylum_name_mothur = "Chloroflexi"
phylum_name_qiime2 = "D_1__Chloroflexi"

Results

Change of microbial community structure with depth and oxygen concentration

Alpha-diversity

Shannon diversity index and Chao1 were calculated for the total microbial community across depth and oxygen concentration gradients for both mothur and QIIME2.

# Alpha-diversity of total community for mothur
m.alpha = estimate_richness(m.norm, measures = c("Chao1", "Shannon"))

m.meta.alpha = full_join(rownames_to_column(m.alpha),
  rownames_to_column(data.frame(m.percent@sam_data)), by = "rowname")

m.shannon.depth.plot <- m.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=Depth_m, y=Shannon)) +
    geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
    labs(title="Mothur", y="Shannon diversity index", x=NULL)

m.chao1.depth.plot <- m.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=Depth_m, y=Chao1)) +
    geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
    labs(title="Mothur", y="Chao1 richness estimator", x="Depth (m)")

m.shannon.o2.plot <- m.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=O2_uM, y=Shannon)) +
    # geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Shannon)) +
    labs(title="Mothur", y="Shannon diversity index", x=NULL)

m.chao1.o2.plot <- m.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=O2_uM, y=Chao1)) +
    # geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Chao1)) +
    labs(title="Mothur", y="Chao1 richness estimator", x="Oxygen (uM)")

# Alpha-diversity of total community for QIIME2
q.alpha = estimate_richness(q.norm, measures = c("Chao1", "Shannon"))

q.meta.alpha = full_join(rownames_to_column(q.alpha),
  rownames_to_column(data.frame(q.percent@sam_data)), by = "rowname")

q.shannon.depth.plot <- q.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=Depth_m, y=Shannon)) +
    geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
    labs(title="Qiime2", y=NULL, x=NULL)

q.chao1.depth.plot <- q.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=Depth_m, y=Chao1)) +
    geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
    labs(title="Qiime2", y=NULL, x="Depth (m)")

q.shannon.o2.plot <- q.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=O2_uM, y=Shannon)) +
    # geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Shannon)) +
    labs(title="Qiime2", y=NULL, x=NULL)

q.chao1.o2.plot <- q.meta.alpha %>% 
  ggplot() +
    geom_point(aes(x=O2_uM, y=Chao1)) +
    # geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Chao1)) +
    labs(title="Qiime2", y=NULL, x="Oxygen (uM)")
# Plotting depth graph
grid.arrange(m.shannon.depth.plot, q.shannon.depth.plot, m.chao1.depth.plot, q.chao1.depth.plot, ncol=2, top=textGrob("Figure 1 Alpha-diversity across Depth",gp=gpar(fontsize=16,font=3)))

The same patterns of alpha-diversity (Shannon diversity index and the Chao1 richness estimator) can be observed across depth for both mothur and QIIME2 (Fig. 1). There is a slightly lower diversity in surface waters (0m) compared to 100m depth. Peak diversity is reached at ~100-120m then diversity decreases with greater depth, with a slight increase at 200m for all but Shannon diversity index for QIIME2.

Note, however, that despite the similarity in the alpha-diversity pattern, the comparison of mothur versus QIIME2 shows difference: across all depths, mothur OTU analysis resulted in a lower alpha-diversity than the QIIME2 ASV analysis when measured with the Shannon diversity index and a higher alpha-diversity than the QIIME2 ASV analysis when measured with Chao1.

# Plotting oxygen graph
grid.arrange(m.shannon.o2.plot, q.shannon.o2.plot, m.chao1.o2.plot, q.chao1.o2.plot, ncol=2, top=textGrob("Figure 2 Alpha-diversity across Oxygen Concentration",gp=gpar(fontsize=16,font=3)))

Looking at Shannon diversity across oxygen concentration (Fig. 2), we find that at equivalent depths QIIME2 has a greater diversity than mothur. However, the pattern exhibited by both mothur and QIIME2 data is still similar. The three lowest diversity points (note for mothur: 2 points at 2.35 overlap) are at an oxygen concentration of 0 uM, while the highest diversity is found at an oxygen concentration of ~38 uM. The band of 95% confidence intervals was not plotted due to the lack of data between ~38 uM and ~217 uM of oxygen.

Comparing Chao1 at different oxygen levels for mothur and QIIME2 shows that the patterns somewhat differ. While the three lowest diversity points are still at 0 uM of oxygen, for mothur the highest diversity in terms of Chao1 is at an oxygen concentration of ~38 uM, while for QIIME2 it is at an oxygen concentration of ~32 uM. For both, oxygen concentration of ~217 uM shows a notable decrease in diversity. Chao1 exhibited a relatively greater drop at ~217 uM of oxygen compared to Shannon.

Taxonomic community composition

# Mothur
m.phyla.plot = m.percent %>%
  plot_bar(fill="Phylum")+
    geom_bar(aes(fill=Phylum), stat="identity")+
    labs(title="Figure 3 Phyla across Samples for Mothur", y="Abundance (%)")+ 
    scale_fill_manual(values=palette)

# QIIME2
q.phyla.plot = q.percent %>%
  plot_bar(fill="Phylum")+
    geom_bar(aes(fill=Phylum), stat="identity")+
    labs(title="Figure 4 Phyla across Samples for QIIME2", y="Abundance (%)")+
    scale_fill_manual(values=palette)



28 and 29 taxons were identified at the phylum level with mothur and QIIME2, respectively (Fig. 3,4). Out of these identified phyla in both mothur and QIIME2, ~4 dominated the community composition in terms of abundance: Proteobacteria, Bacteroidetes, Thaumarchaeota and Actinobacteria (from most to less abundant). Other phyla that are noticeably more abundant include Cyanobacteria, Deferribacteres, Euryarchaeota, Firmiucutes, Gemmatimonadetes, Marinimicrobia, Nitrospinae, Planctomycetes and Verrucomicrobia. Our phylum of interest, Chloroflexi, makes up from 0 to 6% of the microbial community in the collected samples depending on depth. A more specific naming system seems to be used by mothur than QIIME2, which results in a more descriptive labelling of the population composition in the former.

Chloroflexi abundance with depth and oxygen concentration

# Significance across depth
m.chlor.lm = m.norm %>% 
  subset_taxa(Phylum==phylum_name_mothur) %>% 
  tax_glom(taxrank = 'Phylum') %>%
  psmelt() %>%
  lm(Abundance ~ Depth_m, .) %>% 
  summary()

q.chlor.lm = q.norm %>% 
  subset_taxa(Phylum==phylum_name_qiime2) %>% 
  tax_glom(taxrank = 'Phylum') %>%
  psmelt() %>%
  lm(Abundance ~ Depth_m, .) %>% 
  summary()

taxon.abundance = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance <- rbind(taxon.abundance, m.chlor.lm$coefficients["Depth_m",])
taxon.abundance <- rbind(taxon.abundance, q.chlor.lm$coefficients["Depth_m",])
rownames(taxon.abundance) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
kable(taxon.abundance,caption="Table 1 Correlation Data of Chloroflexi Phylum across Depth")
Table 1 Correlation Data of Chloroflexi Phylum across Depth
Estimate Std. Error t value Pr(>|t|) (p-value)
mothur 1.327529 0.6389862 2.077554 0.0923485
QIIME2 2.622128 0.4212043 6.225311 0.0015644
m.abd.depth.plot <- m.percent %>% 
  subset_taxa(Phylum==phylum_name_mothur) %>% 
  psmelt() %>% 
  group_by(Sample) %>% 
  summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>% 
  
ggplot() +
  geom_point(aes(x=Depth_m, y=Abundance_sum)) +
  geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
  labs(title="Mothur", y="Abundance (%)", x="Depth (m)")

q.abd.depth.plot <- q.percent %>% 
  subset_taxa(Phylum==phylum_name_qiime2) %>% 
  psmelt() %>% 
  group_by(Sample) %>% 
  summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>% 
  
ggplot() +
  geom_point(aes(x=Depth_m, y=Abundance_sum)) +
  geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
  labs(title="QIIME2", y=NULL, x="Depth (m)")
# Significance across oxygen concentrations
m.chlor.lm.ox = m.norm %>% 
  subset_taxa(Phylum==phylum_name_mothur) %>% 
  tax_glom(taxrank = 'Phylum') %>%
  psmelt() %>%
  lm(Abundance ~ O2_uM, .) %>% 
  summary()

q.chlor.lm.ox = q.norm %>% 
  subset_taxa(Phylum==phylum_name_qiime2) %>% 
  tax_glom(taxrank = 'Phylum') %>%
  psmelt() %>%
  lm(Abundance ~ O2_uM, .) %>% 
  summary()

taxon.abundance.ox = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance.ox <- rbind(taxon.abundance.ox, m.chlor.lm.ox$coefficients["O2_uM",])
taxon.abundance.ox <- rbind(taxon.abundance.ox, q.chlor.lm.ox$coefficients["O2_uM",])
rownames(taxon.abundance.ox) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance.ox) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
kable(taxon.abundance.ox,caption="Table 2 Correlation Data of Chloroflexi Phylum across Oxygen Concentration")
Table 2 Correlation Data of Chloroflexi Phylum across Oxygen Concentration
Estimate Std. Error t value Pr(>|t|) (p-value)
mothur -0.750471 0.5865861 -1.279387 0.2569128
QIIME2 -1.731996 0.5762708 -3.005525 0.0299088
m.abd.o2.plot <- m.percent %>% 
  subset_taxa(Phylum==phylum_name_mothur) %>% 
  psmelt() %>% 
  group_by(Sample) %>% 
  summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>% 
  
ggplot() +
  geom_point(aes(x=O2_uM, y=Abundance_sum)) +
  geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
  labs(title="Mothur", y="Abundance (%)", x="O2 (uM)")

q.abd.o2.plot <- q.percent %>% 
  subset_taxa(Phylum==phylum_name_qiime2) %>% 
  psmelt() %>% 
  group_by(Sample) %>% 
  summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>% 
  
ggplot() +
  geom_point(aes(x=O2_uM, y=Abundance_sum)) +
  geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
  labs(title="QIIME2", y=NULL, x="O2 (uM)")
# Plotting depth graph
grid.arrange(m.abd.depth.plot, q.abd.depth.plot, ncol=2, top=textGrob("Figure 5 Chloroflexi Abundance across Depth",gp=gpar(fontsize=16,font=3)))

# Plotting oxygen graph
grid.arrange(m.abd.o2.plot, q.abd.o2.plot, ncol=2, top=textGrob("Figure 6 Chloroflexi Abundance across Oxygen Concentration",gp=gpar(fontsize=16,font=3)))

Linear regression analysis of Chloroflexi relative abundance across depth revealed variations between mothur’s OTU and QIIME2’s ASV clustering (Fig. 5). Abundance of ASV clusters revealed a significant correlation with depth (p<0.05), while OTU clusters did not (Table 1). Both correlations were found to be positive.

Similarly, linear regression analysis of Chloroflexi relative abundance across oxygen concentration (Fig. 6) revealed a significant correlation of oxygen concentration with ASV clusters (p<0.05), but not with OTU clusters (Table 2). Both correlations were found to be negative.

Richness within the Chloroflexi phylum

# Number of OTUs
m.tax_table = data.frame(m.norm@tax_table)
m.filtered = m.tax_table %>% 
  rownames_to_column('OTU') %>%
  filter(Phylum==phylum_name_mothur) %>%
  column_to_rownames('OTU')
m.rownumber = nrow(m.filtered)
# Classes in OTUs
m.classes = m.filtered %>%
  select('Class') %>%
  unique %>%
  summarise(Classes = toString(Class))
# Number of ASVs
q.tax_table = data.frame(q.norm@tax_table)
q.filtered = q.tax_table %>% 
  rownames_to_column('ASV') %>%
  filter(Phylum==phylum_name_qiime2) %>%
  column_to_rownames('ASV')
q.rownumber = nrow(q.filtered)
# Classes in ASVs
q.classes = q.filtered %>%
  select('Class') %>%
  unique %>%
  summarise(Classes = toString(Class))
For Chloroflexi, the number of OTUs was found to be 34, and the number of ASVs was found to be 47. The OTUs represent classes:
, while the ASVs represent classes:

Change of abundances of OTUs/ASVs within the Chloroflexi phylum with depth and oxygen concentration

# Example for linear model
otu_stats = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
for (otu in row.names(m.filtered)){
  linear_fit = m.norm %>% 
    psmelt() %>% 
    filter(OTU==otu) %>% 

    lm(Abundance ~ Depth_m, .) %>% 
    summary()
  otu_data = linear_fit$coefficients["Depth_m",]
  otu_stats <- rbind(otu_stats, otu_data)
}
colnames(otu_stats)<- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
row.names(otu_stats) <- row.names(m.filtered)
otu_stats = cbind(data.frame(Class = m.filtered$Class), Genus = m.filtered$Genus, otu_stats)
sorted = arrange(rownames_to_column(otu_stats),Estimate)%>% column_to_rownames(var="rowname")
lm.depth.otus = kable(sorted,caption="Table A1 Correlation data of Chloroflexi OTUs Abundance with Depth")
# Example for correlation graph
m.percent %>% 
  subset_taxa(Phylum==phylum_name_mothur) %>% 
  psmelt() %>% 
  
  ggplot() +
  geom_point(aes(x=Depth_m, y=Abundance)) +
  geom_smooth(method='lm', aes(x=Depth_m, y=Abundance)) +
  facet_wrap(~OTU, scales="free_y") +
  labs(title="Figure 7 Abundance of Chloroflexi OTUs across Depth") +
  xlab("Depth (m)") + 
  ylab("Abundance (%)") + 
  theme(axis.text.x = element_text(angle = 90))

Linear model statistics were performed for the abundance of each OTU and ASV in relation to depth and oxygen concentration (Appendix A Table A1-A4). The linear models were subsequently plotted (Fig. 7-10). No significant correlations were found between any individual OTUs/ASVs abundance and depth or oxygen concentration (p > 0.05 for all).

Although none of the correlations were significant, mothur and QIIME2 showed similar trends. For mothur ten of the 34 OTUs had negative correlation between abundance and depth (the rest positive), while for QIIME2 nine of the 47 ASVs had negative correlation between abundance and depth (the rest positive). This was while for abundance versus oxygen concentration, for mothur all OTUs had negative correlation, and for QIIME2 all but one ASVs had negative correlation.

Change of abundances of OTUs/ASVs within the Chloroflexi phylum with hydrogen sulfide concentration

Linear model statistics were performed for the abundance of each OTU and ASV in relation to hydrogen sulfide concentration (Appendix A Table A5 & A6). The linear models were subsequently plotted (Fig. 11 & 12). Significant positive correlations between abundance and hydrogen sulfide concentration were found for 15 and 19 individual OTUs and ASVs, respectively. For OTUs and ASVs, p-values were <0.05. Classes associated with OTUs and ASVs positively correlating with hydrogen sulfide concentration include Anaerolineae, Dehalococcoidia and the SAR202 clade.

QIIME2 identified 9 more individual members of Chloroflexi that significantly and positively correlate with hydrogen sulfide concentration compared to mothur. In addition, the significance of the results and the positive correlation with hydrogen sulfide concentration were generally greater with QIIME2 than mothur.

Mothur versus QIIME2

As noted in the different sections, in general, major patterns found in a similar way in both mothur and QIIME2. Nevertheless, mothur and QIIME2 outputted in slightly different results as described in the different parts.

Discussion

UPDATE - 8:00 pm March 24, ~1340 words.

If you found significant differences for your taxon across depth/oxygen, why might these be occurring (hint think about the accompanying geochemical data we have)?

The Saanich Inlet provides a good model for the study of oxygen minimum zones and the microbial dynamics that shape and define them. The microbial diversity analyses we carried out with the two bioinformatic pipelines mothur and QIIME2 resulted in mostly similar patterns, but there were differences when it came to the details. Both pipelines found that the peak Shannon diversity index was at 100 m. However, the peak Chao1 richness estimates for the total community is different between the mothur and QIIME2 analyses, with mothur finding 100 m the most diverse while QIIME2 found 120 m as the most diverse. We see the same discrepancy between mothur and QIIME2 and their Chao1 richness estimates when looking at alpha diversity across dissolved oxygen concentrations (Figure 2). As for discrepancies between Shannon diversity and Chao1 richness, we found that at the dissolved oxygen concentration of 217 μm Chao1 was relatively lower than Shannon for both mothur and QIIME2. The lower value of Chao1 compared to Shannon at ~217uM could indicate increased species evenness despite reduced species diversity because Chao1 does not take into account evenness while Shannon does. The two pipelines also agreed that Saanich Inlet is dominated by only 4-5 phyla, with the phylum Chloroflexi making up 0-6% of the microbial community depending on the sample depth. The phylum Chloroflexi contains bacteria with different metabolic characteristics, such as aerobic thermophiles, anoxygenic phototrophs, and anaerobic halorespirers which use halogenated organics as electron acceptors [22]. In our analysis, we found that Chloroflexi abundance was positively correlated with depth (Table 1). This may be a result of oxygen concentrations decreasing with depth: a negative correlation between Chloroflexi abundance and oxygen concentrations was also found (Table 2). Interestingly, these results were determined to only be significant in the QIIME2 analysis, but not the mothur analysis. This discrepancy is discussed later and points out to the fact that the analysis methods, in this case, plays a role in whether our analysis results are significant or not. Therefore, there is an indication for a preference for members of Chloroflexi to inhabit anoxic habitats at depth within Saanich Inlet, and this is supported by previously mentioned research on this phylum [9, 14, 19]. Since Chloroflexi encompasses such vastly different classes of bacteria with disparate metabolistic behavior, it is important to note that the diverse classes within this phylum have different requirements for oxygen content where some of them also thrive in oxic environments [9, 14, 19]. This may explain why some of the abundance vs. depth and oxygen concentration results with mothur where not as significant (Table 1 & 2). Anoxic oceanic zones such as the deep waters of Saanich Inlet may provide an environment in which anaerobic members of Chloroflexi can be more competitive than other phyla and dominate the microbial population.

The richness within *Chloroflexi* highly depends on which bioinformatic pipeline is used for the analysis. Mothur was only able to identify 34 OTUs occupying 2 classes within *Chloroflexi*, while QIIME2 was able to identify 47 ASVs and 5 classes within *Chloroflexi*. However, QIIME2 was unable to identify any genera while mothur was (Figure ?). Therefore, the required depth into the taxonomic tree may dictate which pipeline should be used in future analyses. Correlations between individual OTUs and ASVs within *Chloroflexi* against depth or oxygen concentration were found to be insignificant (Appendix A Table 1-4). Linear models in figure 7-10 show similar trends between the individual OTUs and ASV with depth and oxygen concentration, however there are glaring single outlier defying any correlation between the data points. These outliers are present in all analyses but do not occur at the same depths or oxygen levels for the various ASVs and OTUs studied. This may mean that there is no pattern or rationale for their occurrences. More data would allow to either legitimize these data points or confirm them as outliers to either integrate them or exclude them from the results. 

Strong positive and significant correlations were found between various members of the *Chloroflexi* phylum and hydrogen sulfide concentrations. Classes correlating with hydrogen sulfide levels include Anaerolineae, Dehalococcoidia and the SAR202 clade (Figure 11 & 12, Appendix A Table 5 & 6). These results are supported by previous studies that have identified members of SAR202 cluster belonging to the *Chloroflexi* phylum to play a major role in the sulfur cycle in the dark water column. Some of these members have pathways for sulfur reduction and could be responsible for the hydrogen sulfide concentrations at depth. As previously stated, they also have the potential to metabolize a variety of organosulfur compounds [17]. In addition, single-cell genomics studies of members of the Dehalococcoidia class within the *Chloroflexi* phylum highlighted their association with marine sediments and sulfur cycling [11].

Differences in bioinformatic pipelines for microbial ecology data analysis may result in potential differences in analytical outcomes, and can lead to misidentifying species in different habitats or incorrectly determining trends. As we observed in this study, although mothur and QIIME2 often produced similar patterns, they did not agree on details. While QIIME2 led us to conclude that there was a presence of four classes (plus one uncultured class) of Chloroflexi in our samples, mothur only identified two. These differences are concerning, since depending on the pipeline we use, we might miss organisms we have collected, or we might identify organisms that are in reality not there. Consequently, we can be drawing the wrong conclusions about ecosystems, and the interplay of its inhabitants. Furthermore, whether we find significant correlations can also depend on the pipeline used. While Chloroflexi abundance was found to significantly correlate with depth and oxygen concentration for QIIME2, the correlation was not significant for mothur. In fact, the significances differed by an order of magnitude. This emphasizes the concern that significance in analysis findings may rest upon the usage of different pipelines and clustering paradigms, leading to false-positives and false-negatives. This also highlights the importance of developing objective metrics to gauge the accuracy of both clustering paradigms.

Differences we observed between the usage of mothur versus QIIME2 to analyse the reads in this study can also be seen in other research. For instance, a study examining the composition of chicken cecum microbiome performed by Allalil et al. revealed lower phylogenetic diversity (PD) values when UPARSE pipelines were used in comparison to de novo QIIME pipelines and open reference QIIME pipelines [23]. However, Species Richness (S) values were comparable when comparing different pipelines. In addition, the number of assigned sequences for different sequencing platform runs were impacted because of OTU picking using different pipelines: De novo vs. open reference QIIME pipelines. Furthermore, the QIIME pipelines generated different relative abundance of specific genera in comparison to UPARSE. Moreover, differences in the detection profiles, such as the number of unique species, were observed when using different pipelines. The number of OTUs and taxonomic assignments produced and identified differed between pipelines with a 99% similarity threshold.

Since multiple classes were identified under the order Chloroflexi and variations in the directions of correlation with depth were observed for several clusters, future analyses may find more meaning at a sub-phylum level. Additionally, more data were obtained than were analyzed in the current report. In the future, one could look for correlations of abundance across the other factors, such as temperature or salinity, not just depth and oxygen concentration. Furthermore, there is a gap in the data between the depth of 10m and depth of 100m, which makes it difficult to determine correlations. More consistent data collection with more samples and at more regular intervals could help alleviate such problems, and potentially show significant correlations. Analysis of data available from the collection of samples over time could be interesting in exploring how the diversity in the area changes over seasons, or over longer time periods, such as decades. It could also be of interest, for any unknown, or not very well known, organisms to look into more details of their genetic make up in order to determine what roles, if any, they might play in biogeochemical cycles.

References

[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.

[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.

[3] 2012. Saanich Inlet. MicrobeWiki.

[4] Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E. 2005. Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society B: Biological Sciences 360: 1935–1943.

[5] Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. Multidisciplinary Journal of Microbial Ecology 11: 2639–2643.

[6] Schmidt TSB, Rodrigues JFM, Christian M. 2014. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLoS Comput Biol. 10.

[7] Wang, Y., Sheng, H., He, Y., Wu, J., Jiang, Y., Tam, N. F., & Zhou, H. 2012. Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Applied and Environmental Microbiology 78: 8264-8271. 10.1128/AEM.01821-12

[8] Thiel V, Hamilton TL, Tomsho LP, Burhans R, Gay SE, Schuster SC, et al. 2014. Draft genome sequence of a sulfide-oxidizing, autotrophic filamentous anoxygenic phototrophic bacterium, Chloroflexus sp. strain MS-G (Chloroflexi). Genome Announc 2: 9–10.

[9] Sekiguchi Y, Yamada T, Hanada S, Ohashi A, Harada H, Kamagata Y. 2003. Anaerolinea thermophila gen. nov., sp. nov. and Caldilinea aerophila gen. nov., sp. nov., novel filamentous thermophiles that represent a previously uncultured lineage of the domain bacteria at the subphylum level. Int J Syst Evol Microbiol 53: 1843–51.

[10] Kiss H, Nett M, Domin N, Martin K, Maresca JA, Copeland A, Lapidus A, Lucas S, Berry KW, Rio TGD, Dalin E, Tice H, Pitluck S, Richardson P, Bruce D, Goodwin L, Han C, Detter JC, Schmutz J, Brettin T, Land M, Hauser L, Kyrpides NC, Ivanova N, Göker M, Woyke T, Klenk H-P, Bryant DA. 2011. Complete genome sequence of the filamentous gliding predatory bacterium Herpetosiphon aurantiacus type strain (114-95T). Standards in Genomic Sciences 5: 356–370.

[11] Wasmund, K., Cooper, M., Schreiber, L., Lloyd, K. G., Baker, B. J., Petersen, D. G., . . . Adrian, L. 2016. Single-cell genome and group-specific dsrAB sequencing implicate marine members of the class dehalococcoidia (phylum chloroflexi) in sulfur cycling. Mbio 7: e00266. 10.1128/mBio.00266-16

[12] Biderre-Petit C, Dugat-Bony E, Mege M, Parisot N, Adrian L, Moné A, Denonfoux J, Peyretaillade E, Debroas D, Boucher D, Peyret P. 2016. Distribution of Dehalococcoidia in the Anaerobic Deep Water of a Remote Meromictic Crater Lake and Detection of Dehalococcoidia-Derived Reductive Dehalogenase Homologous Genes. Plos One 11.

[13] Hugenholtz, P., Goebel, B. M., & Pace, N. R. 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. Journal of Bacteriology 18: 4765-4774.

[14] Xia Y, Wang Y, Wang Y, Chin FYL, Zhang T. 2016. Cellular adhesiveness and cellulolytic capacity in Anaerolineae revealed by omics-based genome interpretation. Biotechnology for Biofuels 9.

[15] Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. 1996. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proceedings of the National Academy of Sciences 93:7979–7984.

[16] Morris RM, Rappé MS, Urbach E, Connon SA, Rappe MS, Giovannoni SJ. 2004. Prevalence of the Chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and deep ocean. Appl Environ Microbiol 70: 2836–42.

[17] Mehrshad M, Rodriguez-Valera F, Amoozegar MA, López-García P, Ghai R. 2017. The enigmatic SAR202 cluster up close: shedding light on a globally distributed dark ocean lineage involved in sulfur cycling. The ISME Journal 12: 655–668.

[18] Wegner C-E, Liesack W. 2017. Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology 8.

[19] Ye Q, Wu Y, Zhu Z, Wang X, Li Z, Zhang J. 2016. Bacterial diversity in the surface sediments of the hypoxic zone near the Changjiang Estuary and in the East China Sea. MicrobiologyOpen 5: 323–339.

[20] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4: 170159.

[21] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, del Rio TG, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4: 170160.

[22] “Chloroflexi (phylum)” on Revolvy.com. Trivia Quizzes.

[23] Allali I, Arnold J.W., Roach J, Cadenas M.B., Butz N, Hassan H.M., Koci M, Ballou A, Mendoza M, Ali R, Azcarate-Peril M.A. 2017. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiology 17: 1-16.

Appendices

Appendix A: Correlation of Chloroflexi OTUs/ASVs Abundance with Depth and Oxygen Concentration

Table A1 Correlation data of Chloroflexi OTUs Abundance with Depth
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Otu0181 SAR202_clade SAR202_clade_ge -0.1713584 0.3794814 -0.4515595 0.6704985
Otu1579 SAR202_clade SAR202_clade_ge -0.0073322 0.0162544 -0.4510917 0.6708136
Otu1149 SAR202_clade SAR202_clade_ge -0.0035352 0.0082586 -0.4280621 0.6864177
Otu4286 SAR202_clade SAR202_clade_ge -0.0035352 0.0082586 -0.4280621 0.6864177
Otu1064 SAR202_clade SAR202_clade_ge -0.0027496 0.0165362 -0.1662767 0.8744539
Otu2632 SAR202_clade SAR202_clade_ge -0.0023568 0.0055057 -0.4280621 0.6864177
Otu4287 SAR202_clade SAR202_clade_ge -0.0011784 0.0027529 -0.4280621 0.6864177
Otu2381 Anaerolineae uncultured -0.0005237 0.0056008 -0.0935100 0.9291298
Otu2592 SAR202_clade SAR202_clade_ge -0.0005237 0.0056008 -0.0935100 0.9291298
Otu2591 SAR202_clade SAR202_clade_ge -0.0002619 0.0028004 -0.0935100 0.9291298
Otu1577 SAR202_clade SAR202_clade_ge 0.0001637 0.0036177 0.0452401 0.9656672
Otu3712 Anaerolineae uncultured 0.0008511 0.0055928 0.1521723 0.8850009
Otu3607 Anaerolineae uncultured 0.0034043 0.0023533 1.4465667 0.2076595
Otu2790 Anaerolineae Thermomarinilinea 0.0034043 0.0023533 1.4465667 0.2076595
Otu3623 Anaerolineae Thermomarinilinea 0.0036007 0.0053694 0.6705821 0.5322101
Otu4340 Anaerolineae Anaerolineaceae_unclassified 0.0068085 0.0047067 1.4465667 0.2076595
Otu2789 Anaerolineae Thermomarinilinea 0.0068085 0.0047067 1.4465667 0.2076595
Otu1558 Anaerolineae Anaerolineaceae_unclassified 0.0070049 0.0049223 1.4231039 0.2139907
Otu1863 Anaerolineae Pelolinea 0.0079214 0.0046360 1.7086714 0.1482107
Otu3589 Anaerolineae uncultured 0.0136170 0.0094133 1.4465667 0.2076595
Otu1419 Anaerolineae Thermomarinilinea 0.0136170 0.0094133 1.4465667 0.2076595
Otu2497 Anaerolineae Pelolinea 0.0136170 0.0094133 1.4465667 0.2076595
Otu1147 Anaerolineae Thermomarinilinea 0.0146645 0.0110438 1.3278449 0.2416173
Otu1983 Anaerolineae Thermomarinilinea 0.0158101 0.0123144 1.2838745 0.2554599
Otu1246 Anaerolineae Thermomarinilinea 0.0158429 0.0092720 1.7086714 0.1482107
Otu1851 Anaerolineae Anaerolineaceae_unclassified 0.0170213 0.0117667 1.4465667 0.2076595
Otu0662 Anaerolineae Thermomarinilinea 0.0340426 0.0235333 1.4465667 0.2076595
Otu0551 Anaerolineae Thermomarinilinea 0.0365957 0.0465283 0.7865264 0.4671821
Otu1028 Anaerolineae Thermomarinilinea 0.0374468 0.0258867 1.4465667 0.2076595
Otu0607 Anaerolineae Thermomarinilinea 0.0389853 0.0438394 0.8892756 0.4145865
Otu0799 Anaerolineae Pelolinea 0.0477578 0.0280660 1.7016226 0.1495636
Otu0217 Anaerolineae Anaerolineaceae_unclassified 0.1946645 0.2102439 0.9258985 0.3969899
Otu0215 Anaerolineae Thermomarinilinea 0.4527660 0.3129935 1.4465667 0.2076595
Otu0195 Anaerolineae Anaerolineaceae_unclassified 0.5344681 0.3694735 1.4465667 0.2076595
Table A2 Correlation Data of Chloroflexi ASVs Abundance with Depth
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Asv1886 D_2__SAR202 clade D_5__ -0.3397709 0.2301978 -1.4759954 0.1999714
Asv800 D_2__SAR202 clade D_5__ -0.1327332 0.3683890 -0.3603073 0.7333378
Asv1266 D_2__SAR202 clade D_5__ -0.0329951 0.0770801 -0.4280621 0.6864177
Asv1289 D_2__Anaerolineae D_5__uncultured -0.0164975 0.0385401 -0.4280621 0.6864177
Asv1979 D_2__Anaerolineae D_5__uncultured -0.0057610 0.0616089 -0.0935100 0.9291298
Asv1144 D_2__SAR202 clade D_5__ -0.0039280 0.1354082 -0.0290085 0.9779801
Asv341 D_2__JG30-KF-CM66 D_5__ -0.0035352 0.0082586 -0.4280621 0.6864177
Asv1862 D_2__Anaerolineae D_5__uncultured -0.0034043 0.0364052 -0.0935100 0.9291298
Asv1260 D_2__Anaerolineae D_5__uncultured -0.0007856 0.0084012 -0.0935100 0.9291298
Asv2081 D_2__Anaerolineae D_5__uncultured 0.0011129 0.0027583 0.4034830 0.7032691
Asv2034 D_2__SAR202 clade D_5__ 0.0038298 0.0251674 0.1521723 0.8850009
Asv1142 D_2__JG30-KF-CM66 D_5__ 0.0057610 0.0801057 0.0719180 0.9454553
Asv1046 D_2__Anaerolineae D_5__uncultured 0.0111293 0.0275831 0.4034830 0.7032691
Asv2247 D_2__JG30-KF-CM66 D_5__ 0.0126023 0.0187931 0.6705821 0.5322101
Asv400 D_2__uncultured D_5__ 0.0136170 0.0094133 1.4465667 0.2076595
Asv496 D_2__Dehalococcoidia NA 0.0136170 0.0094133 1.4465667 0.2076595
Asv2063 D_2__Anaerolineae NA 0.0189198 0.0468912 0.4034830 0.7032691
Asv134 D_2__Anaerolineae NA 0.0234043 0.0349014 0.6705821 0.5322101
Asv1473 D_2__SAR202 clade NA 0.0238298 0.0164733 1.4465667 0.2076595
Asv1794 D_2__Anaerolineae D_5__uncultured 0.0238298 0.0164733 1.4465667 0.2076595
Asv1234 D_2__SAR202 clade D_5__ 0.0272340 0.0188267 1.4465667 0.2076595
Asv477 D_2__Anaerolineae D_5__uncultured 0.0288052 0.0429556 0.6705821 0.5322101
Asv590 D_2__Anaerolineae D_5__uncultured 0.0306383 0.0211800 1.4465667 0.2076595
Asv1003 D_2__SAR202 clade D_5__ 0.0306383 0.0211800 1.4465667 0.2076595
Asv1282 D_2__Anaerolineae D_5__uncultured 0.0340426 0.0235333 1.4465667 0.2076595
Asv490 D_2__Anaerolineae D_5__uncultured 0.0396072 0.0859823 0.4606434 0.6643958
Asv1664 D_2__Anaerolineae D_5__uncultured 0.0414075 0.0617486 0.6705821 0.5322101
Asv1939 D_2__Anaerolineae D_5__Longilinea 0.0418331 0.0911012 0.4591934 0.6653680
Asv1163 D_2__Anaerolineae D_5__uncultured 0.0476596 0.0329467 1.4465667 0.2076595
Asv473 D_2__Anaerolineae D_5__uncultured 0.0522095 0.0778570 0.6705821 0.5322101
Asv2315 D_2__Anaerolineae D_5__uncultured 0.0578723 0.0400067 1.4465667 0.2076595
Asv1693 D_2__Anaerolineae D_5__uncultured 0.0583633 0.0676342 0.8629259 0.4276204
Asv555 D_2__Anaerolineae D_5__uncultured 0.0748936 0.0517734 1.4465667 0.2076595
Asv1943 D_2__Anaerolineae D_5__uncultured 0.0792144 0.1181278 0.6705821 0.5322101
Asv428 D_2__Anaerolineae D_5__uncultured 0.0955810 0.1216822 0.7854968 0.4677332
Asv114 D_2__JG30-KF-CM66 D_5__ 0.1054664 0.1601592 0.6585100 0.5393201
Asv2324 D_2__Anaerolineae NA 0.1089362 0.0753067 1.4465667 0.2076595
Asv1423 D_2__Anaerolineae D_5__Longilinea 0.1123404 0.0776600 1.4465667 0.2076595
Asv1505 D_2__Anaerolineae D_5__uncultured 0.1123404 0.0776600 1.4465667 0.2076595
Asv271 D_2__Anaerolineae D_5__uncultured 0.1361702 0.0941334 1.4465667 0.2076595
Asv208 D_2__Anaerolineae D_5__uncultured 0.1468412 0.1522869 0.9642409 0.3792105
Asv1095 D_2__Anaerolineae NA 0.1634043 0.1129601 1.4465667 0.2076595
Asv161 D_2__Anaerolineae D_5__uncultured 0.1668085 0.1153134 1.4465667 0.2076595
Asv1108 D_2__Anaerolineae D_5__uncultured 0.1669722 0.1917355 0.8708462 0.4236697
Asv408 D_2__Anaerolineae D_5__uncultured 0.1859247 0.2109964 0.8811750 0.4185601
Asv1071 D_2__Anaerolineae D_5__uncultured 0.3438298 0.2376868 1.4465667 0.2076595
Asv1749 D_2__Anaerolineae D_5__uncultured 0.5208511 0.3600602 1.4465667 0.2076595
Table A3 Correlation Data of Chloroflexi OTUs Abundance with Oxygen Concentration
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Otu0195 Anaerolineae Anaerolineaceae_unclassified -0.1897296 0.3302310 -0.5745358 0.5904867
Otu0217 Anaerolineae Anaerolineaceae_unclassified -0.1736086 0.1582994 -1.0967102 0.3227568
Otu0215 Anaerolineae Thermomarinilinea -0.1607263 0.2797499 -0.5745358 0.5904867
Otu0181 SAR202_clade SAR202_clade_ge -0.0520091 0.2990620 -0.1739074 0.8687601
Otu0607 Anaerolineae Thermomarinilinea -0.0317385 0.0336870 -0.9421584 0.3893700
Otu0551 Anaerolineae Thermomarinilinea -0.0269047 0.0362727 -0.7417334 0.4915977
Otu0799 Anaerolineae Pelolinea -0.0200649 0.0258114 -0.7773675 0.4721013
Otu1028 Anaerolineae Thermomarinilinea -0.0132932 0.0231372 -0.5745358 0.5904867
Otu0662 Anaerolineae Thermomarinilinea -0.0120847 0.0210338 -0.5745358 0.5904867
Otu1983 Anaerolineae Thermomarinilinea -0.0084593 0.0103315 -0.8187858 0.4501563
Otu1147 Anaerolineae Thermomarinilinea -0.0084593 0.0092049 -0.9189965 0.4002601
Otu1246 Anaerolineae Thermomarinilinea -0.0072508 0.0084400 -0.8590967 0.4295406
Otu1851 Anaerolineae Anaerolineaceae_unclassified -0.0060423 0.0105169 -0.5745358 0.5904867
Otu1419 Anaerolineae Thermomarinilinea -0.0048339 0.0084135 -0.5745358 0.5904867
Otu2497 Anaerolineae Pelolinea -0.0048339 0.0084135 -0.5745358 0.5904867
Otu3589 Anaerolineae uncultured -0.0048339 0.0084135 -0.5745358 0.5904867
Otu1558 Anaerolineae Anaerolineaceae_unclassified -0.0036254 0.0042200 -0.8590967 0.4295406
Otu1863 Anaerolineae Pelolinea -0.0036254 0.0042200 -0.8590967 0.4295406
Otu2789 Anaerolineae Thermomarinilinea -0.0024169 0.0042068 -0.5745358 0.5904867
Otu4340 Anaerolineae Anaerolineaceae_unclassified -0.0024169 0.0042068 -0.5745358 0.5904867
Otu3623 Anaerolineae Thermomarinilinea -0.0024169 0.0042068 -0.5745358 0.5904867
Otu1064 SAR202_clade SAR202_clade_ge -0.0020728 0.0128145 -0.1617557 0.8778315
Otu1579 SAR202_clade SAR202_clade_ge -0.0012945 0.0128349 -0.1008584 0.9235824
Otu3712 Anaerolineae uncultured -0.0012919 0.0043048 -0.3001126 0.7761680
Otu2790 Anaerolineae Thermomarinilinea -0.0012085 0.0021034 -0.5745358 0.5904867
Otu3607 Anaerolineae uncultured -0.0012085 0.0021034 -0.5745358 0.5904867
Otu1577 SAR202_clade SAR202_clade_ge -0.0009643 0.0027703 -0.3480925 0.7419469
Otu2592 SAR202_clade SAR202_clade_ge -0.0006367 0.0043341 -0.1469078 0.8889445
Otu2381 Anaerolineae uncultured -0.0006367 0.0043341 -0.1469078 0.8889445
Otu1149 SAR202_clade SAR202_clade_ge -0.0004881 0.0065115 -0.0749568 0.9431557
Otu4286 SAR202_clade SAR202_clade_ge -0.0004881 0.0065115 -0.0749568 0.9431557
Otu2632 SAR202_clade SAR202_clade_ge -0.0003254 0.0043410 -0.0749568 0.9431557
Otu2591 SAR202_clade SAR202_clade_ge -0.0003184 0.0021670 -0.1469078 0.8889445
Otu4287 SAR202_clade SAR202_clade_ge -0.0001627 0.0021705 -0.0749568 0.9431557
Table A4 Correlation Data of Chloroflexi ASVs Abundance with Oxygen Concentration
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Asv1749 D_2__Anaerolineae D_5__uncultured -0.1848957 0.3218175 -0.5745358 0.5904867
Asv408 D_2__Anaerolineae D_5__uncultured -0.1764060 0.1570152 -1.1234959 0.3122557
Asv1108 D_2__Anaerolineae D_5__uncultured -0.1646951 0.1413959 -1.1647802 0.2966535
Asv208 D_2__Anaerolineae D_5__uncultured -0.1439410 0.1112113 -1.2943021 0.2521126
Asv1071 D_2__Anaerolineae D_5__uncultured -0.1220553 0.2124416 -0.5745358 0.5904867
Asv428 D_2__Anaerolineae D_5__uncultured -0.1061416 0.0879361 -1.2070308 0.2814027
Asv114 D_2__JG30-KF-CM66 D_5__ -0.0969221 0.1218861 -0.7951862 0.4625657
Asv800 D_2__SAR202 clade D_5__ -0.0875482 0.2864534 -0.3056281 0.7722028
Asv161 D_2__Anaerolineae D_5__uncultured -0.0592150 0.1030657 -0.5745358 0.5904867
Asv1095 D_2__Anaerolineae NA -0.0580065 0.1009624 -0.5745358 0.5904867
Asv1943 D_2__Anaerolineae D_5__uncultured -0.0531726 0.0925488 -0.5745358 0.5904867
Asv271 D_2__Anaerolineae D_5__uncultured -0.0483387 0.0841353 -0.5745358 0.5904867
Asv1939 D_2__Anaerolineae D_5__Longilinea -0.0476310 0.0688397 -0.6919125 0.5198017
Asv490 D_2__Anaerolineae D_5__uncultured -0.0452141 0.0649448 -0.6961928 0.5173357
Asv1693 D_2__Anaerolineae D_5__uncultured -0.0447133 0.0524914 -0.8518223 0.4332067
Asv1505 D_2__Anaerolineae D_5__uncultured -0.0398795 0.0694116 -0.5745358 0.5904867
Asv1423 D_2__Anaerolineae D_5__Longilinea -0.0398795 0.0694116 -0.5745358 0.5904867
Asv2324 D_2__Anaerolineae NA -0.0386710 0.0673082 -0.5745358 0.5904867
Asv473 D_2__Anaerolineae D_5__uncultured -0.0350456 0.0609981 -0.5745358 0.5904867
Asv1664 D_2__Anaerolineae D_5__uncultured -0.0277948 0.0483778 -0.5745358 0.5904867
Asv555 D_2__Anaerolineae D_5__uncultured -0.0265863 0.0462744 -0.5745358 0.5904867
Asv1144 D_2__SAR202 clade D_5__ -0.0252671 0.1043155 -0.2422180 0.8182327
Asv2063 D_2__Anaerolineae NA -0.0205440 0.0357575 -0.5745358 0.5904867
Asv2315 D_2__Anaerolineae D_5__uncultured -0.0205440 0.0357575 -0.5745358 0.5904867
Asv477 D_2__Anaerolineae D_5__uncultured -0.0193355 0.0336541 -0.5745358 0.5904867
Asv1163 D_2__Anaerolineae D_5__uncultured -0.0169186 0.0294474 -0.5745358 0.5904867
Asv134 D_2__Anaerolineae NA -0.0157101 0.0273440 -0.5745358 0.5904867
Asv1282 D_2__Anaerolineae D_5__uncultured -0.0120847 0.0210338 -0.5745358 0.5904867
Asv1046 D_2__Anaerolineae D_5__uncultured -0.0120847 0.0210338 -0.5745358 0.5904867
Asv1142 D_2__JG30-KF-CM66 D_5__ -0.0112835 0.0618942 -0.1823031 0.8625058
Asv590 D_2__Anaerolineae D_5__uncultured -0.0108762 0.0189304 -0.5745358 0.5904867
Asv1003 D_2__SAR202 clade D_5__ -0.0108762 0.0189304 -0.5745358 0.5904867
Asv1234 D_2__SAR202 clade D_5__ -0.0096677 0.0168271 -0.5745358 0.5904867
Asv1794 D_2__Anaerolineae D_5__uncultured -0.0084593 0.0147237 -0.5745358 0.5904867
Asv1473 D_2__SAR202 clade NA -0.0084593 0.0147237 -0.5745358 0.5904867
Asv2247 D_2__JG30-KF-CM66 D_5__ -0.0084593 0.0147237 -0.5745358 0.5904867
Asv1979 D_2__Anaerolineae D_5__uncultured -0.0070038 0.0476747 -0.1469078 0.8889445
Asv2034 D_2__SAR202 clade D_5__ -0.0058137 0.0193716 -0.3001126 0.7761680
Asv496 D_2__Dehalococcoidia NA -0.0048339 0.0084135 -0.5745358 0.5904867
Asv400 D_2__uncultured D_5__ -0.0048339 0.0084135 -0.5745358 0.5904867
Asv1266 D_2__SAR202 clade D_5__ -0.0045554 0.0607736 -0.0749568 0.9431557
Asv1862 D_2__Anaerolineae D_5__uncultured -0.0041386 0.0281714 -0.1469078 0.8889445
Asv1289 D_2__Anaerolineae D_5__uncultured -0.0022777 0.0303868 -0.0749568 0.9431557
Asv2081 D_2__Anaerolineae D_5__uncultured -0.0012085 0.0021034 -0.5745358 0.5904867
Asv1260 D_2__Anaerolineae D_5__uncultured -0.0009551 0.0065011 -0.1469078 0.8889445
Asv341 D_2__JG30-KF-CM66 D_5__ -0.0004881 0.0065115 -0.0749568 0.9431557
Asv1886 D_2__SAR202 clade D_5__ 0.1614351 0.2011515 0.8025547 0.4586642
Table A5 Correlation Data of Chloroflexi OTUs Abundance with Hydrogen Sulfide Concentration
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Otu0181 SAR202_clade SAR202_clade_ge -2.4575728 3.3035421 -0.7439205 0.4903847
Otu0217 Anaerolineae Anaerolineaceae_unclassified -0.9242423 2.0042270 -0.4611465 0.6640586
Otu0607 Anaerolineae Thermomarinilinea -0.1124721 0.4212890 -0.2669714 0.8001524
Otu1064 SAR202_clade SAR202_clade_ge -0.0796436 0.1448049 -0.5500061 0.6059809
Otu1579 SAR202_clade SAR202_clade_ge -0.0796436 0.1448049 -0.5500061 0.6059809
Otu1149 SAR202_clade SAR202_clade_ge -0.0341330 0.0740614 -0.4608737 0.6642415
Otu4286 SAR202_clade SAR202_clade_ge -0.0341330 0.0740614 -0.4608737 0.6642415
Otu0551 Anaerolineae Thermomarinilinea -0.0280166 0.4433827 -0.0631883 0.9520649
Otu2381 Anaerolineae uncultured -0.0227553 0.0493743 -0.4608737 0.6642415
Otu2592 SAR202_clade SAR202_clade_ge -0.0227553 0.0493743 -0.4608737 0.6642415
Otu2632 SAR202_clade SAR202_clade_ge -0.0227553 0.0493743 -0.4608737 0.6642415
Otu3712 Anaerolineae uncultured -0.0227553 0.0493743 -0.4608737 0.6642415
Otu1577 SAR202_clade SAR202_clade_ge -0.0227553 0.0309087 -0.7362104 0.4946701
Otu2591 SAR202_clade SAR202_clade_ge -0.0113777 0.0246871 -0.4608737 0.6642415
Otu4287 SAR202_clade SAR202_clade_ge -0.0113777 0.0246871 -0.4608737 0.6642415
Otu3623 Anaerolineae Thermomarinilinea 0.0032080 0.0503917 0.0636606 0.9517072
Otu3607 Anaerolineae uncultured 0.0552843 0.0049066 11.2673382 0.0000962
Otu2790 Anaerolineae Thermomarinilinea 0.0552843 0.0049066 11.2673382 0.0000962
Otu1558 Anaerolineae Anaerolineaceae_unclassified 0.0584922 0.0454851 1.2859653 0.2547855
Otu1863 Anaerolineae Pelolinea 0.0991909 0.0280249 3.5393861 0.0165734
Otu2789 Anaerolineae Thermomarinilinea 0.1105686 0.0098132 11.2673382 0.0000962
Otu4340 Anaerolineae Anaerolineaceae_unclassified 0.1105686 0.0098132 11.2673382 0.0000962
Otu1983 Anaerolineae Thermomarinilinea 0.1185885 0.1161660 1.0208533 0.3541507
Otu1147 Anaerolineae Thermomarinilinea 0.1203422 0.1022047 1.1774631 0.2920002
Otu1246 Anaerolineae Thermomarinilinea 0.1983818 0.0560498 3.5393861 0.0165734
Otu3589 Anaerolineae uncultured 0.2211371 0.0196264 11.2673382 0.0000962
Otu1419 Anaerolineae Thermomarinilinea 0.2211371 0.0196264 11.2673382 0.0000962
Otu2497 Anaerolineae Pelolinea 0.2211371 0.0196264 11.2673382 0.0000962
Otu1851 Anaerolineae Anaerolineaceae_unclassified 0.2764214 0.0245330 11.2673382 0.0000962
Otu0662 Anaerolineae Thermomarinilinea 0.5528428 0.0490660 11.2673382 0.0000962
Otu1028 Anaerolineae Thermomarinilinea 0.6081271 0.0539726 11.2673382 0.0000962
Otu0799 Anaerolineae Pelolinea 0.6618074 0.1140109 5.8047714 0.0021396
Otu0215 Anaerolineae Thermomarinilinea 7.3528091 0.6525773 11.2673382 0.0000962
Otu0195 Anaerolineae Anaerolineaceae_unclassified 8.6796317 0.7703356 11.2673382 0.0000962
Table A6 Correlation Data of Chloroflexi ASVs Abundance with Hydrogen Sulfide Concentration
Class Genus Estimate Std. Error t value Pr(>|t|) (p-value)
Asv800 D_2__SAR202 clade D_5__ -2.9695672 3.0816822 -0.9636189 0.3794937
Asv1886 D_2__SAR202 clade D_5__ -2.2758300 2.2620814 -1.0060778 0.3605552
Asv1108 D_2__Anaerolineae D_5__uncultured -1.2574021 1.7629163 -0.7132512 0.5075876
Asv408 D_2__Anaerolineae D_5__uncultured -1.1503411 1.9735619 -0.5828756 0.5852742
Asv208 D_2__Anaerolineae D_5__uncultured -1.1127006 1.4059595 -0.7914172 0.4645707
Asv428 D_2__Anaerolineae D_5__uncultured -1.0569670 1.0591512 -0.9979378 0.3641245
Asv1144 D_2__SAR202 clade D_5__ -0.6485262 1.1827890 -0.5483025 0.6070659
Asv114 D_2__JG30-KF-CM66 D_5__ -0.6042138 1.4769566 -0.4090938 0.6994049
Asv1939 D_2__Anaerolineae D_5__Longilinea -0.5119943 0.8044173 -0.6364785 0.5524571
Asv490 D_2__Anaerolineae D_5__uncultured -0.4892390 0.7585529 -0.6449637 0.5473730
Asv1142 D_2__JG30-KF-CM66 D_5__ -0.3868402 0.6996938 -0.5528707 0.6041591
Asv1266 D_2__SAR202 clade D_5__ -0.3185743 0.6912399 -0.4608737 0.6642415
Asv1979 D_2__Anaerolineae D_5__uncultured -0.2503083 0.5431170 -0.4608737 0.6642415
Asv2063 D_2__Anaerolineae NA -0.1934201 0.4196813 -0.4608737 0.6642415
Asv1289 D_2__Anaerolineae D_5__uncultured -0.1592871 0.3456199 -0.4608737 0.6642415
Asv1862 D_2__Anaerolineae D_5__uncultured -0.1479095 0.3209328 -0.4608737 0.6642415
Asv1046 D_2__Anaerolineae D_5__uncultured -0.1137765 0.2468714 -0.4608737 0.6642415
Asv2034 D_2__SAR202 clade D_5__ -0.1023989 0.2221842 -0.4608737 0.6642415
Asv1693 D_2__Anaerolineae D_5__uncultured -0.0964323 0.6505274 -0.1482371 0.8879484
Asv1260 D_2__Anaerolineae D_5__uncultured -0.0341330 0.0740614 -0.4608737 0.6642415
Asv341 D_2__JG30-KF-CM66 D_5__ -0.0341330 0.0740614 -0.4608737 0.6642415
Asv2081 D_2__Anaerolineae D_5__uncultured -0.0113777 0.0246871 -0.4608737 0.6642415
Asv2247 D_2__JG30-KF-CM66 D_5__ 0.0112279 0.1763709 0.0636606 0.9517072
Asv134 D_2__Anaerolineae NA 0.0208518 0.3275459 0.0636606 0.9517072
Asv477 D_2__Anaerolineae D_5__uncultured 0.0256637 0.4031335 0.0636606 0.9517072
Asv1664 D_2__Anaerolineae D_5__uncultured 0.0368916 0.5795044 0.0636606 0.9517072
Asv473 D_2__Anaerolineae D_5__uncultured 0.0465155 0.7306794 0.0636606 0.9517072
Asv1943 D_2__Anaerolineae D_5__uncultured 0.0705752 1.1086170 0.0636606 0.9517072
Asv400 D_2__uncultured D_5__ 0.2211371 0.0196264 11.2673382 0.0000962
Asv496 D_2__Dehalococcoidia NA 0.2211371 0.0196264 11.2673382 0.0000962
Asv1473 D_2__SAR202 clade NA 0.3869900 0.0343462 11.2673382 0.0000962
Asv1794 D_2__Anaerolineae D_5__uncultured 0.3869900 0.0343462 11.2673382 0.0000962
Asv1234 D_2__SAR202 clade D_5__ 0.4422742 0.0392528 11.2673382 0.0000962
Asv590 D_2__Anaerolineae D_5__uncultured 0.4975585 0.0441594 11.2673382 0.0000962
Asv1003 D_2__SAR202 clade D_5__ 0.4975585 0.0441594 11.2673382 0.0000962
Asv1282 D_2__Anaerolineae D_5__uncultured 0.5528428 0.0490660 11.2673382 0.0000962
Asv1163 D_2__Anaerolineae D_5__uncultured 0.7739799 0.0686923 11.2673382 0.0000962
Asv2315 D_2__Anaerolineae D_5__uncultured 0.9398327 0.0834121 11.2673382 0.0000962
Asv555 D_2__Anaerolineae D_5__uncultured 1.2162541 0.1079451 11.2673382 0.0000962
Asv2324 D_2__Anaerolineae NA 1.7690969 0.1570111 11.2673382 0.0000962
Asv1423 D_2__Anaerolineae D_5__Longilinea 1.8243812 0.1619177 11.2673382 0.0000962
Asv1505 D_2__Anaerolineae D_5__uncultured 1.8243812 0.1619177 11.2673382 0.0000962
Asv271 D_2__Anaerolineae D_5__uncultured 2.2113711 0.1962638 11.2673382 0.0000962
Asv1095 D_2__Anaerolineae NA 2.6536454 0.2355166 11.2673382 0.0000962
Asv161 D_2__Anaerolineae D_5__uncultured 2.7089297 0.2404232 11.2673382 0.0000962
Asv1071 D_2__Anaerolineae D_5__uncultured 5.5837121 0.4955662 11.2673382 0.0000962
Asv1749 D_2__Anaerolineae D_5__uncultured 8.4584946 0.7507092 11.2673382 0.0000962